Overcoming statistical machine translation limitations: error analysis and proposed solutions for the Catalan-Spanish language pair
نویسندگان
چکیده
This work aims to improve an N-gram-based statistical machine translation system between the Catalan and Spanish languages, trained with an aligned Spanish– Catalan parallel corpus consisting of 1.7 million sentences taken from El Periódico M. Farrús (&) M. R. Costa-jussà J. B. Mariño M. Poch A. Hernández C. Henrı́quez J. A. R. Fonollosa TALP Research Center, Department of Signal Theory and Communications, Universitat Politècnica de Catalunya, C/Jordi Girona 1-3, 08034 Barcelona, Spain e-mail: [email protected] Present Address: M. Farrús Office of Learning Technologies, Universitat Oberta de Catalunya, Av. Tibidabo, 47, 08035 Barcelona, Spain e-mail: [email protected] J. B. Mariño e-mail: [email protected] M. Poch e-mail: [email protected] A. Hernández e-mail: [email protected] C. Henrı́quez e-mail: [email protected] J. A. R. Fonollosa e-mail: [email protected] M. R. Costa-jussà Voice and Language Department, Barcelona Media Innovation Center, Av Diagonal 177, 9th Floor, 08018 Barcelona, Spain e-mail: [email protected] Present Address: M. Poch Universitat Pompeu Fabra, Roc Boronat, 138, 08018 Barcelona, Spain e-mail: [email protected] 123 Lang Resources & Evaluation (2011) 45:181–208 DOI 10.1007/s10579-011-9137-0
منابع مشابه
Catalan-English Statistical Machine Translation without Parallel Corpus: Bridging through Spanish
This paper presents a full experiment on large-vocabulary Catalan-English statistical machine translation without an English-Catalan parallel corpus, in the context of the debates of the European Parliament. For this, we make use of an English-Spanish European Parliament Proceedings parallel corpus and a Spanish-Catalan general newspaper parallel corpus, both of which of more than 30 M words. G...
متن کاملCatalan-English statistical machine translation without a parallel corpus
This paper presents a full experiment on large-vocabulary Catalan-English statistical machine translation without an English-Catalan parallel corpus, in the context of the debates of the European Parliament. For this, we make use of an English-Spanish European Parliament Proceedings parallel corpus and a Spanish-Catalan general newspaper parallel corpus, both of which of more than 30 M words. G...
متن کاملEnglish-Catalan Neural Machine Translation in the Biomedical Domain through the cascade approach
This paper describes the methodology followed to build a neural machine translation system in the biomedical domain for the English-Catalan language pair. This task can be considered a low-resourced task from the point of view of the domain and the language pair. To face this task, this paper reports experiments on a cascade pivot strategy through Spanish for the neural machine translation usin...
متن کاملTowards the Use of Word Stems and Suffixes for Statistical Machine Translation
In this paper we present methods for improving the quality of translation from an inflected language into English by making use of part-of-speech tags and word stems and suffixes in the source language. Results for translations from Spanish and Catalan into English are presented on the LC-STAR trilingual corpus which consists of spontaneously spoken dialogues in the domain of travelling and app...
متن کاملA Large Spanish-Catalan Parallel Corpus Release for Machine Translation
We present a large Spanish-Catalan parallel corpus extracted from ten years of the paper edition of a bilingual Catalan newspaper. The produced corpus of 7.5 M parallel sentences (around 180 M words per language) is useful for many natural language applications. We report excellent results when building a statistical machine translation system trained on this parallel corpus. The Spanish-Catala...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Language Resources and Evaluation
دوره 45 شماره
صفحات -
تاریخ انتشار 2011